IUPHAR/BPS Guide to Pharmacology (Pawson et al. 2014). Finally, the protein-

protein interaction data, disease gene data, and drug target were combined to

calculate the network distance between all the drugs and a given disease (Cheng

et al. 2018).

4.3.4

Chemical Similarity

The chemical similarity ensemble approach (Keiser et al. 2007) compares target

proteins by using the chemical similarity of the ligands that bind to them, represented

as e-values, adapting the basic local alignment and search tool algorithms (Altschul

et al. 1990; Hert et al. 2008). The structural similarity between each drug and each

targets ligand set was quantied as an e-value using the similarity ensemble

approach (Keiser et al. 2007). It can be used to quickly search large ligand databases

and to identify similarity maps among target proteins in large scale. The method is

different from traditional bioinformatics methods for identifying similarity between

proteins that uses the sequence of amino acids or three-dimensional structural

similarity among target proteins. A total of ~3600 drugs were compared against

~65,000 ligands organized into 246 targets from the MDL Drug Data Report

database

(Schuffenhauer

et

al.

2002),

generating

0.9

million

drug-target

comparisons. Most of the drugs had no signicant expectation values to most of

the ligand sets. Along all possible pairs of drugs and ligand sets, ~6900 pairs of

drugs and ligand sets were similar, with signicant e-values. Predicted off-target

proteins with strong similarity ensemble expectation values are evaluated for novelty

using the literature.

4.4

Summary

Thanks to the emerging innovations in technologies, the low-cost sequencing and

high-throughput technologies are resulting in the generation of a massive number of

genomic datasets in biology and medicine. Currently, there are a large number of

candidate disease genes identied through GWAS and other approaches. Massive

data on single cell transcriptomics is enabling us to precisely identify the cell types

and associated gene expression signatures involved in different diseases. There is a

growing amount of data on FDA-approved drugs to treat the disease and several

other drugs which are not toxic to humans but failed to treat the diseases. Integrating

the

current

datasets

on

single

cell

transcriptomic,

genotype-phenotype,

pharmacogenomic, protein-protein interactions and pathways could ultimately result

in identifying drug action mechanisms, disease mechanisms, and new uses of

existing drugs. However, the current methods to deal with the massive amount of

high-dimensional genomic (big data) data are very limited. There is a need to

develop new statistical and computational methods to deal with rapidly growing,

high-dimensional, and heterogeneous genomic datasets and use these methods for

drug repurposing.

4

Computational Methods for Drug Repurposing

45